In this section we will learn to search and download DNA methylation (epigenetic) and gene expression (transcription) data from the newly created NCI Genomic Data Commons (GDC) portal and prepare them into a Summarized Experiment object.
The figure below hihglights the workflow part which will be covered in this section.
First we will launch the GUI for TCGAbiolinks.
library(TCGAbiolinksGUI)
TCGAbiolinksGUI()After launching the GUI select the GDC Data/Get GDC data/Molecular data.
Fill the search fields with the same information below and click on Visualize Data. If you select Filter using clinical data under the clinical filter we will also plot the clinical information.
A plot with the summary of the data will be shown. Also, if you want more details you can also open the
GDC search results: Results section.
After the query is completed, you will be able to download the data and convert it to an R object in the Download & Prepare section. If successful it will give you a message where the data was saved.
## Visualizing the Summarized Experiment
The integrative data container SummarizedExperiment object (Morgan M and H., n.d.,Huber et al. (2015)) contains 3 matrices, one with sample metadata, one with features metadata and one with the assay data.
To visualize the SummarizedExperiment object select GDC Data/Manage SummarizedExperiment:
And click on Select Summarized Experiment file. Select the file downloaded from GDC.
You can access sample metadata
the assay data
or the features metadata
Again, fill the search fields with the same information below and click on Visualize Data. If you select Filter using clinical data under the clinical filter we will also plot the clinical information.
A plot with the summary of the data will be shown.
After the query is completed, you will be able to download the data and convert it to an R object in the Download & Prepare section.
If successful it will give you a message where the data was saved.
sessionInfo()## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Bioc2017.TCGAbiolinks.ELMER_0.0.0.9000
## [2] SummarizedExperiment_1.6.3
## [3] DelayedArray_0.2.7
## [4] matrixStats_0.52.2
## [5] Biobase_2.36.2
## [6] GenomicRanges_1.28.3
## [7] GenomeInfoDb_1.12.2
## [8] IRanges_2.10.2
## [9] S4Vectors_0.14.3
## [10] BiocGenerics_0.22.0
## [11] TCGAbiolinks_2.5.6
## [12] bindrcpp_0.2
## [13] dplyr_0.7.1
## [14] DT_0.2
## [15] ELMER_2.0.1
## [16] ELMER.data_2.0.1
## [17] MultiAssayExperiment_1.2.1
##
## loaded via a namespace (and not attached):
## [1] shinydashboard_0.6.1 R.utils_2.5.0
## [3] RSQLite_2.0 AnnotationDbi_1.38.1
## [5] htmlwidgets_0.9 grid_3.4.0
## [7] trimcluster_0.1-2 BiocParallel_1.10.1
## [9] devtools_1.13.2 DESeq_1.28.0
## [11] munsell_0.4.3 codetools_0.2-15
## [13] withr_1.0.2 colorspace_1.3-2
## [15] BiocInstaller_1.26.0 knitr_1.16
## [17] robustbase_0.92-7 labeling_0.3
## [19] GenomeInfoDbData_0.99.0 KMsurv_0.1-5
## [21] mnormt_1.5-5 hwriter_1.3.2
## [23] bit64_0.9-7 rprojroot_1.2
## [25] downloader_0.4 biovizBase_1.24.0
## [27] ggthemes_3.4.0 EDASeq_2.10.0
## [29] diptest_0.75-7 R6_2.2.2
## [31] doParallel_1.0.10 locfit_1.5-9.1
## [33] AnnotationFilter_1.0.0 flexmix_2.3-14
## [35] reshape_0.8.6 bitops_1.0-6
## [37] assertthat_0.2.0 scales_0.4.1
## [39] nnet_7.3-12 gtable_0.2.0
## [41] ensembldb_2.0.3 rlang_0.1.1
## [43] genefilter_1.58.1 cmprsk_2.2-7
## [45] GlobalOptions_0.0.12 splines_3.4.0
## [47] rtracklayer_1.36.3 lazyeval_0.2.0
## [49] acepack_1.4.1 dichromat_2.0-0
## [51] selectr_0.3-1 broom_0.4.2
## [53] checkmate_1.8.3 yaml_2.1.14
## [55] reshape2_1.4.2 GenomicFeatures_1.28.4
## [57] backports_1.1.0 httpuv_1.3.5
## [59] Hmisc_4.0-3 tools_3.4.0
## [61] psych_1.7.5 ggplot2_2.2.1
## [63] RColorBrewer_1.1-2 Rcpp_0.12.11
## [65] plyr_1.8.4 base64enc_0.1-3
## [67] zlibbioc_1.22.0 purrr_0.2.2.2
## [69] RCurl_1.95-4.8 ggpubr_0.1.4
## [71] rpart_4.1-11 GetoptLong_0.1.6
## [73] viridis_0.4.0 zoo_1.8-0
## [75] ggrepel_0.6.5 cluster_2.0.6
## [77] magrittr_1.5 data.table_1.10.4
## [79] circlize_0.4.0 survminer_0.4.0
## [81] mvtnorm_1.0-6 whisker_0.3-2
## [83] ProtGenerics_1.8.0 aroma.light_3.6.0
## [85] hms_0.3 mime_0.5
## [87] evaluate_0.10.1 xtable_1.8-2
## [89] XML_3.98-1.9 mclust_5.3
## [91] gridExtra_2.2.1 shape_1.4.2
## [93] compiler_3.4.0 biomaRt_2.32.1
## [95] tibble_1.3.3 R.oo_1.21.0
## [97] htmltools_0.3.6 Formula_1.2-2
## [99] tidyr_0.6.3 geneplotter_1.54.0
## [101] DBI_0.7 matlab_1.0.2
## [103] ComplexHeatmap_1.14.0 MASS_7.3-47
## [105] fpc_2.1-10 BiocStyle_2.4.0
## [107] ShortRead_1.34.0 Matrix_1.2-10
## [109] readr_1.1.1 R.methodsS3_1.7.1
## [111] Gviz_1.20.0 bindr_0.1
## [113] km.ci_0.5-2 pkgconfig_2.0.1
## [115] GenomicAlignments_1.12.1 foreign_0.8-69
## [117] plotly_4.7.0 xml2_1.1.1
## [119] roxygen2_6.0.1 foreach_1.4.3
## [121] annotate_1.54.0 XVector_0.16.0
## [123] rvest_0.3.2 stringr_1.2.0
## [125] VariantAnnotation_1.22.3 digest_0.6.12
## [127] ConsensusClusterPlus_1.40.0 Biostrings_2.44.1
## [129] rmarkdown_1.6 survMisc_0.5.4
## [131] htmlTable_1.9 dendextend_1.5.2
## [133] edgeR_3.18.1 curl_2.7
## [135] kernlab_0.9-25 shiny_1.0.3
## [137] Rsamtools_1.28.0 commonmark_1.2
## [139] modeltools_0.2-21 rjson_0.2.15
## [141] nlme_3.1-131 jsonlite_1.5
## [143] viridisLite_0.2.0 limma_3.32.2
## [145] BSgenome_1.44.0 lattice_0.20-35
## [147] httr_1.2.1 DEoptimR_1.0-8
## [149] survival_2.41-3 interactiveDisplayBase_1.14.0
## [151] glue_1.1.1 UpSetR_1.3.3
## [153] prabclus_2.2-6 iterators_1.0.8
## [155] bit_1.1-12 class_7.3-14
## [157] stringi_1.1.5 blob_1.1.0
## [159] AnnotationHub_2.8.2 latticeExtra_0.6-28
## [161] memoise_1.1.0
Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2). Nature Publishing Group: 115–21.
Morgan M, Hester J, Obenchain V, and Pagès H. n.d. “SummarizedExperiment: SummarizedExperiment Container. R Package Version 1.1.0.” http://bioconductor.org/packages/SummarizedExperiment/.